Table of Contents¶

  • 1.Introduction
    • 1.1. Feature Names and Descriptions
  • 2. Data Exploration
    • 2.1. Data Sample and Info
  • 3. Handling Missing Values
  • 4. Feature Engineering
    • 4.1. Creating New Features
  • 5. Findings and Recommendations
  • 6. Conclusion

Data Source : https://www.kaggle.com/datasets/russellyates88/suicide-rates-overview-1985-to-2016

1. Indroduction¶

Suicide is a critical issue that impacts individuals and communities worldwide. In this study, By analyzing a diverse dataset comprising demographic and economic information, we aim to gain valuable insights into suicide patterns. Through understanding the underlying factors and risks, our objective is to contribute towards effective prevention strategies that promote mental well-being and foster a compassionate society. Together, we can empower communities to address this critical issue and work towards saving lives and nurturing supportive environments.

1.1 Feature Names and Descriptions¶

Feature Name Description
country The name of the country where the data is recorded.
year The year in which the data is recorded.
sex The gender (male or female) for which the data is reported.
age The age group to which the data corresponds.
suicides_no The number of suicides reported for a specific group.
population The population count for a specific group.
suicides/100k pop The number of suicides per 100,000 population.
HDI for year The Human Development Index value for a specific year.
gdp for year The gross domestic product (GDP) for a specific year.
gdp per captia The GDP per capita, representing the economic output per person.
generation The generational group to which individuals belong.

Import Libraries¶

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import seaborn as sns

%matplotlib inline  
sns.set(rc={'figure.figsize': [11, 4]}, font_scale=0.7)

2. Data Exploration¶

2.1 Data Sample and Info¶

In [2]:
df = pd.read_csv('master.csv')
df.sample(10)
Out[2]:
country year sex age suicides_no population suicides/100k pop country-year HDI for year gdp_for_year ($) gdp_per_capita ($) generation
24581 Sweden 2004 female 25-34 years 32 570904 5.61 Sweden2004 NaN 381,705,425,302 44831 Generation X
12743 Israel 2012 male 25-34 years 57 568580 10.02 Israel2012 0.890 257,296,579,579 36263 Millenials
8930 Finland 2003 male 35-54 years 376 783442 47.99 Finland2003 NaN 171,071,106,095 34701 Boomers
7242 Czech Republic 2002 male 55-74 years 282 921139 30.61 Czech Republic2002 NaN 81,910,771,994 8399 Silent
7654 Denmark 2013 female 25-34 years 16 322819 4.96 Denmark2013 0.923 343,584,385,594 64831 Millenials
5581 Chile 2011 female 35-54 years 158 2367343 6.67 Chile2011 0.821 252,251,992,029 15854 Generation X
25597 Trinidad and Tobago 2008 female 55-74 years 4 95285 4.20 Trinidad and Tobago2008 NaN 27,870,257,894 22857 Silent
22022 Serbia 2002 male 15-24 years 69 516180 13.37 Serbia2002 NaN 16,116,843,146 2258 Millenials
21357 Saint Lucia 1991 male 25-34 years 2 10359 19.31 Saint Lucia1991 NaN 513,753,818 4194 Boomers
27441 Uruguay 2005 female 75+ years 18 131946 13.64 Uruguay2005 0.756 17,362,857,684 5655 Silent
In [3]:
df.shape
Out[3]:
(27820, 12)
In [4]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27820 entries, 0 to 27819
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   country             27820 non-null  object 
 1   year                27820 non-null  int64  
 2   sex                 27820 non-null  object 
 3   age                 27820 non-null  object 
 4   suicides_no         27820 non-null  int64  
 5   population          27820 non-null  int64  
 6   suicides/100k pop   27820 non-null  float64
 7   country-year        27820 non-null  object 
 8   HDI for year        8364 non-null   float64
 9    gdp_for_year ($)   27820 non-null  object 
 10  gdp_per_capita ($)  27820 non-null  int64  
 11  generation          27820 non-null  object 
dtypes: float64(2), int64(4), object(6)
memory usage: 2.5+ MB
In [5]:
df.describe()
Out[5]:
year suicides_no population suicides/100k pop HDI for year gdp_per_capita ($)
count 27820.000000 27820.000000 2.782000e+04 27820.000000 8364.000000 27820.000000
mean 2001.258375 242.574407 1.844794e+06 12.816097 0.776601 16866.464414
std 8.469055 902.047917 3.911779e+06 18.961511 0.093367 18887.576472
min 1985.000000 0.000000 2.780000e+02 0.000000 0.483000 251.000000
25% 1995.000000 3.000000 9.749850e+04 0.920000 0.713000 3447.000000
50% 2002.000000 25.000000 4.301500e+05 5.990000 0.779000 9372.000000
75% 2008.000000 131.000000 1.486143e+06 16.620000 0.855000 24874.000000
max 2016.000000 22338.000000 4.380521e+07 224.970000 0.944000 126352.000000
In [6]:
df.describe(include='O')
Out[6]:
country sex age country-year gdp_for_year ($) generation
count 27820 27820 27820 27820 27820 27820
unique 101 2 6 2321 2321 6
top Mauritius male 15-24 years Albania1987 2,156,624,900 Generation X
freq 382 13910 4642 12 12 6408
In [7]:
# I am dropping the country-year column since it is the combination of country and year columns
df.drop('country-year', axis=1, inplace=True)
In [8]:
df.head()
Out[8]:
country year sex age suicides_no population suicides/100k pop HDI for year gdp_for_year ($) gdp_per_capita ($) generation
0 Albania 1987 male 15-24 years 21 312900 6.71 NaN 2,156,624,900 796 Generation X
1 Albania 1987 male 35-54 years 16 308000 5.19 NaN 2,156,624,900 796 Silent
2 Albania 1987 female 15-24 years 14 289700 4.83 NaN 2,156,624,900 796 Generation X
3 Albania 1987 male 75+ years 1 21800 4.59 NaN 2,156,624,900 796 G.I. Generation
4 Albania 1987 male 25-34 years 9 274300 3.28 NaN 2,156,624,900 796 Boomers
In [9]:
# Changing columns names
df.columns = df.columns.str.strip().str.replace(' ','_')
In [10]:
df.columns
Out[10]:
Index(['country', 'year', 'sex', 'age', 'suicides_no', 'population',
       'suicides/100k_pop', 'HDI_for_year', 'gdp_for_year_($)',
       'gdp_per_capita_($)', 'generation'],
      dtype='object')
In [11]:
# checking for null values
df.isna().sum()
Out[11]:
country                   0
year                      0
sex                       0
age                       0
suicides_no               0
population                0
suicides/100k_pop         0
HDI_for_year          19456
gdp_for_year_($)          0
gdp_per_capita_($)        0
generation                0
dtype: int64
In [12]:
df.duplicated().sum()
Out[12]:
0
In [13]:
df.nunique()
Out[13]:
country                 101
year                     32
sex                       2
age                       6
suicides_no            2084
population            25564
suicides/100k_pop      5298
HDI_for_year            305
gdp_for_year_($)       2321
gdp_per_capita_($)     2233
generation                6
dtype: int64

Exploration (Country) Feature¶

In [14]:
df['country'].unique()
Out[14]:
array(['Albania', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba',
       'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain',
       'Barbados', 'Belarus', 'Belgium', 'Belize',
       'Bosnia and Herzegovina', 'Brazil', 'Bulgaria', 'Cabo Verde',
       'Canada', 'Chile', 'Colombia', 'Costa Rica', 'Croatia', 'Cuba',
       'Cyprus', 'Czech Republic', 'Denmark', 'Dominica', 'Ecuador',
       'El Salvador', 'Estonia', 'Fiji', 'Finland', 'France', 'Georgia',
       'Germany', 'Greece', 'Grenada', 'Guatemala', 'Guyana', 'Hungary',
       'Iceland', 'Ireland', 'Israel', 'Italy', 'Jamaica', 'Japan',
       'Kazakhstan', 'Kiribati', 'Kuwait', 'Kyrgyzstan', 'Latvia',
       'Lithuania', 'Luxembourg', 'Macau', 'Maldives', 'Malta',
       'Mauritius', 'Mexico', 'Mongolia', 'Montenegro', 'Netherlands',
       'New Zealand', 'Nicaragua', 'Norway', 'Oman', 'Panama', 'Paraguay',
       'Philippines', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar',
       'Republic of Korea', 'Romania', 'Russian Federation',
       'Saint Kitts and Nevis', 'Saint Lucia',
       'Saint Vincent and Grenadines', 'San Marino', 'Serbia',
       'Seychelles', 'Singapore', 'Slovakia', 'Slovenia', 'South Africa',
       'Spain', 'Sri Lanka', 'Suriname', 'Sweden', 'Switzerland',
       'Thailand', 'Trinidad and Tobago', 'Turkey', 'Turkmenistan',
       'Ukraine', 'United Arab Emirates', 'United Kingdom',
       'United States', 'Uruguay', 'Uzbekistan'], dtype=object)
In [15]:
px.histogram(df['country'])

Exploration (Year) Feature¶

In [16]:
df['year'].unique()
Out[16]:
array([1987, 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
       2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
       1985, 1986, 1990, 1991, 2012, 2013, 2014, 2015, 2011, 2016],
      dtype=int64)
In [17]:
sns.countplot(data=df,x=df['year'])
Out[17]:
<AxesSubplot: xlabel='year', ylabel='count'>

Exploration (Sex) Feature¶

In [18]:
df['sex'].unique()
Out[18]:
array(['male', 'female'], dtype=object)
In [19]:
sns.countplot(data=df,x=df['sex'])
Out[19]:
<AxesSubplot: xlabel='sex', ylabel='count'>

Exploration (age) Feature¶

In [20]:
df['age'].unique()
Out[20]:
array(['15-24 years', '35-54 years', '75+ years', '25-34 years',
       '55-74 years', '5-14 years'], dtype=object)
In [21]:
sns.countplot(data=df,x=df['age'])
Out[21]:
<AxesSubplot: xlabel='age', ylabel='count'>

Exploration (suicides_no) Feature¶

In [22]:
df['suicides_no'].unique()
Out[22]:
array([  21,   16,   14, ..., 5503, 4359, 2872], dtype=int64)
In [23]:
sns.boxplot(data=df, x=df['suicides_no'])
Out[23]:
<AxesSubplot: xlabel='suicides_no'>

Exploration (population) Feature¶

In [24]:
df['population'].unique()
Out[24]:
array([ 312900,  308000,  289700, ..., 2762158, 2631600, 1438935],
      dtype=int64)
In [25]:
sns.boxplot(x=df['population'])
Out[25]:
<AxesSubplot: xlabel='population'>

Exploration (suicides/100k pop) Feature¶

In [26]:
df['suicides/100k_pop'].unique()
Out[26]:
array([ 6.71,  5.19,  4.83, ..., 47.86, 40.75, 26.61])
In [27]:
sns.boxplot(x=df['suicides/100k_pop'])
Out[27]:
<AxesSubplot: xlabel='suicides/100k_pop'>

Exploration (HDI for year) Feature¶

In [28]:
df['HDI_for_year'].unique()
Out[28]:
array([  nan, 0.619, 0.656, 0.695, 0.722, 0.781, 0.783, 0.694, 0.705,
       0.731, 0.762, 0.775, 0.811, 0.818, 0.831, 0.833, 0.836, 0.632,
       0.605, 0.648, 0.721, 0.723, 0.728, 0.733, 0.865, 0.882, 0.898,
       0.927, 0.93 , 0.932, 0.933, 0.935, 0.764, 0.794, 0.815, 0.853,
       0.879, 0.881, 0.884, 0.885, 0.609, 0.64 , 0.778, 0.78 , 0.774,
       0.786, 0.727, 0.816, 0.819, 0.817, 0.821, 0.824, 0.7  , 0.716,
       0.753, 0.765, 0.793, 0.785, 0.683, 0.796, 0.798, 0.806, 0.851,
       0.874, 0.866, 0.883, 0.886, 0.889, 0.888, 0.89 , 0.644, 0.664,
       0.701, 0.71 , 0.711, 0.715, 0.724, 0.576, 0.608, 0.702, 0.737,
       0.742, 0.746, 0.752, 0.755, 0.686, 0.696, 0.713, 0.749, 0.773,
       0.779, 0.782, 0.827, 0.849, 0.861, 0.867, 0.892, 0.903, 0.909,
       0.91 , 0.912, 0.654, 0.699, 0.788, 0.814, 0.83 , 0.832, 0.573,
       0.596, 0.629, 0.679, 0.706, 0.718, 0.72 , 0.623, 0.652, 0.682,
       0.704, 0.75 , 0.756, 0.761, 0.766, 0.807, 0.653, 0.685, 0.73 ,
       0.776, 0.772, 0.768, 0.769, 0.8  , 0.848, 0.852, 0.85 , 0.847,
       0.863, 0.868, 0.87 , 0.862, 0.902, 0.908, 0.92 , 0.921, 0.923,
       0.631, 0.645, 0.665, 0.674, 0.698, 0.717, 0.732, 0.522, 0.566,
       0.603, 0.638, 0.658, 0.662, 0.666, 0.719, 0.838, 0.855, 0.859,
       0.857, 0.869, 0.878, 0.741, 0.825, 0.887, 0.672, 0.735, 0.74 ,
       0.747, 0.754, 0.801, 0.906, 0.911, 0.915, 0.916, 0.759, 0.799,
       0.864, 0.739, 0.483, 0.513, 0.552, 0.611, 0.617, 0.624, 0.626,
       0.627, 0.542, 0.581, 0.618, 0.63 , 0.634, 0.802, 0.823, 0.828,
       0.826, 0.896, 0.897, 0.899, 0.77 , 0.803, 0.895, 0.893, 0.894,
       0.738, 0.829, 0.856, 0.873, 0.872, 0.65 , 0.671, 0.729, 0.791,
       0.891, 0.69 , 0.804, 0.795, 0.809, 0.812, 0.615, 0.562, 0.593,
       0.614, 0.639, 0.655, 0.67 , 0.813, 0.837, 0.839, 0.805, 0.88 ,
       0.822, 0.575, 0.647, 0.777, 0.748, 0.877, 0.919, 0.922, 0.82 ,
       0.905, 0.907, 0.625, 0.628, 0.917, 0.931, 0.94 , 0.941, 0.942,
       0.944, 0.714, 0.564, 0.579, 0.604, 0.646, 0.668, 0.669, 0.677,
       0.84 , 0.843, 0.676, 0.844, 0.841, 0.703, 0.751, 0.691, 0.697,
       0.757, 0.771, 0.736, 0.743, 0.767, 0.763, 0.876, 0.613, 0.643,
       0.651, 0.659, 0.663, 0.725, 0.845, 0.597, 0.692, 0.707, 0.709,
       0.901, 0.904, 0.846, 0.924, 0.925, 0.928, 0.539, 0.572, 0.684,
       0.726, 0.673, 0.688, 0.913, 0.667, 0.79 , 0.594, 0.661, 0.675])

Exploration (gdp_for_year ($)) Feature¶

In [29]:
df['gdp_for_year_($)'].unique()
Out[29]:
array(['2,156,624,900', '2,126,000,000', '2,335,124,988', ...,
       '51,821,573,338', '57,690,453,461', '63,067,077,179'], dtype=object)

Exploration (gdp_per_capita_($)) Feature¶

In [30]:
df['gdp_per_capita_($)'].unique()
Out[30]:
array([ 796,  769,  833, ..., 1964, 2150, 2309], dtype=int64)
In [31]:
sns.histplot(data=df, x='gdp_per_capita_($)')
Out[31]:
<AxesSubplot: xlabel='gdp_per_capita_($)', ylabel='Count'>

Exploration (generation) Feature¶

In [32]:
df['generation'].unique()
Out[32]:
array(['Generation X', 'Silent', 'G.I. Generation', 'Boomers',
       'Millenials', 'Generation Z'], dtype=object)
In [33]:
sns.catplot(data=df, x="generation", kind="count", palette="ch:.25")
Out[33]:
<seaborn.axisgrid.FacetGrid at 0x224f19eadf0>

Observation of Features¶

After examining each feature in the dataset, it was found that there are outliers present in every numeric column. However, these outliers are not unusual or incorrect data points, but rather align with the overall trends and patterns observed in the dataset.

3. Handling Missing Values¶

In [34]:
df.isna().sum() / df.shape[0]*100
Out[34]:
country                0.000000
year                   0.000000
sex                    0.000000
age                    0.000000
suicides_no            0.000000
population             0.000000
suicides/100k_pop      0.000000
HDI_for_year          69.935298
gdp_for_year_($)       0.000000
gdp_per_capita_($)     0.000000
generation             0.000000
dtype: float64

Dropping 'HDI_for_year' column due to significant missing data of approximately 70%.

In [35]:
df.drop('HDI_for_year', axis=1, inplace=True)
In [36]:
df.isna().sum() 
Out[36]:
country               0
year                  0
sex                   0
age                   0
suicides_no           0
population            0
suicides/100k_pop     0
gdp_for_year_($)      0
gdp_per_capita_($)    0
generation            0
dtype: int64

4. Feature Engineering¶

4.1 Creating New Features¶

Column Name Description
suicides_rate Calculated by dividing the 'suicides/100k_pop' column by 100, effectively converting it into a percentage format representing the suicide rate per 100,000 population.
gdp_total Total GDP for each country-year. Obtained by multiplying 'gdp_per_capita_($)' by 'population'. Represents the economic output.
suicides_per_gdp Ratio of 'suicides_no' to 'gdp_total'. Examines the relationship between suicides and the economic output of a country.
In [37]:
# Calculate 'suicides_rate'
df['suicides_rate'] = (df['suicides/100k_pop'] / 100)
In [38]:
# Calculate 'gdp_total'
df['gdp_total'] = df['gdp_per_capita_($)'] * df['population']
In [39]:
# Calculate 'suicides_per_gdp'
df['suicides_per_gdp'] = df['suicides_no'] / df['gdp_total']

Fixing 'gdp_for_year' feature: removing commas to convert the values to numeric format.¶

In [40]:
df['gdp_for_year_($)'] = df['gdp_for_year_($)'].apply(lambda x: int(x.replace(',', '')))
In [41]:
df['gdp_for_year_($)'].unique()
Out[41]:
array([ 2156624900,  2126000000,  2335124988, ..., 51821573338,
       57690453461, 63067077179], dtype=int64)
In [42]:
sns.boxplot(x=df['gdp_for_year_($)'])
Out[42]:
<AxesSubplot: xlabel='gdp_for_year_($)'>
In [43]:
df.columns
Out[43]:
Index(['country', 'year', 'sex', 'age', 'suicides_no', 'population',
       'suicides/100k_pop', 'gdp_for_year_($)', 'gdp_per_capita_($)',
       'generation', 'suicides_rate', 'gdp_total', 'suicides_per_gdp'],
      dtype='object')
In [44]:
df.head()
Out[44]:
country year sex age suicides_no population suicides/100k_pop gdp_for_year_($) gdp_per_capita_($) generation suicides_rate gdp_total suicides_per_gdp
0 Albania 1987 male 15-24 years 21 312900 6.71 2156624900 796 Generation X 0.0671 249068400 8.431419e-08
1 Albania 1987 male 35-54 years 16 308000 5.19 2156624900 796 Silent 0.0519 245168000 6.526137e-08
2 Albania 1987 female 15-24 years 14 289700 4.83 2156624900 796 Generation X 0.0483 230601200 6.071087e-08
3 Albania 1987 male 75+ years 1 21800 4.59 2156624900 796 G.I. Generation 0.0459 17352800 5.762759e-08
4 Albania 1987 male 25-34 years 9 274300 3.28 2156624900 796 Boomers 0.0328 218342800 4.121959e-08
In [45]:
df.describe()
Out[45]:
year suicides_no population suicides/100k_pop gdp_for_year_($) gdp_per_capita_($) suicides_rate gdp_total suicides_per_gdp
count 27820.000000 27820.000000 2.782000e+04 27820.000000 2.782000e+04 27820.000000 27820.000000 2.782000e+04 2.782000e+04
mean 2001.258375 242.574407 1.844794e+06 12.816097 4.455810e+11 16866.464414 0.128161 3.713721e+10 3.208183e-08
std 8.469055 902.047917 3.911779e+06 18.961511 1.453610e+12 18887.576472 0.189615 1.333012e+11 9.682094e-08
min 1985.000000 0.000000 2.780000e+02 0.000000 4.691962e+07 251.000000 0.000000 2.131500e+05 0.000000e+00
25% 1995.000000 3.000000 9.749850e+04 0.920000 8.985353e+09 3447.000000 0.009200 5.641520e+08 8.335782e-10
50% 2002.000000 25.000000 4.301500e+05 5.990000 4.811469e+10 9372.000000 0.059900 3.390442e+09 5.143449e-09
75% 2008.000000 131.000000 1.486143e+06 16.620000 2.602024e+11 24874.000000 0.166200 1.999500e+10 2.154262e-08
max 2016.000000 22338.000000 4.380521e+07 224.970000 1.812071e+13 126352.000000 2.249700 2.515602e+12 2.905276e-06

5. Findings and Recommendations¶

  • In this notebook we will solve the following questions to analyze and understand the causes of suicide while providing actionable recommendations for effective solutions.

1- Does the suicide rate differ significantly between different ages, and is there a notable difference based on gender ?¶

In [46]:
suicide_age_data = df.groupby(['age', 'sex']).agg({
    'suicides_rate':'mean'
    }).reset_index().sort_values(by='suicides_rate')
fig = px.bar(suicide_age_data, x='age', y='suicides_rate', color='sex',
            barmode='group', title='Suicide Rates by Age Category and Gender')
fig.show()

The visualization shows that older individuals (age 75+) have a higher suicide rate than adults and young people. This suggests that as people get older, they may face more difficulties and feel more desperate. Additionally, males have a higher suicide rate than females in all age groups, indicating a potential difference in mental health challenges between genders. These findings highlight the importance of providing specific support for older individuals and addressing mental health issues that affect different genders.

2- Is there a correlation between the GDP per capita and suicide rates? Do countries with higher GDPs have lower suicide rates over the years?¶

In [47]:
data_GDP_suicide_rate = df.groupby(["country", "year"]).agg({
    "gdp_per_capita_($)": "mean",
    "suicides_rate": "mean"
}).reset_index()

fig = px.scatter(data_GDP_suicide_rate, x='gdp_per_capita_($)', y="suicides_rate", color='year', trendline="ols")
fig.update_layout(title="GDP per Capita vs. Suicide Rate (Over Time)",
                  xaxis_title="GDP per Capita ($)",
                  yaxis_title="Suicide Rate per 100k Population")

The visualization shows a negative correlation between GDP per Capita and suicide rate, indicating that higher GDP per Capita is associated with lower suicide rates. The graph also shows an upward trend in GDP per Capita over time, coinciding with a decrease in suicide rates. These findings emphasize the role of economic development in reducing suicide rates and promoting mental well-being. Policymakers and organizations should prioritize improving living standards to contribute to suicide prevention and population mental health.

3- Is there a correlation between a country's GDP for a given year and its suicide rates? Do countries with higher GDPs tend to have lower suicide rates over the years?¶

In [48]:
data_GDP_suicide_rate = df.groupby(["country", "year"]).agg({
    "gdp_for_year_($)": "mean",
    "suicides_rate": "mean"
}).reset_index()

fig = px.scatter(data_GDP_suicide_rate, x='gdp_for_year_($)', y="suicides_rate", color='year', trendline="ols")
fig.update_layout(title="GDP for Year vs. Suicide Rate (Over Time)",
                  xaxis_title="GDP for Year ($)",
                  yaxis_title="Suicide Rate per 100k Population")

The visualization reveals a negative relationship between a country's GDP for a specific year and its suicide rate. Countries with lower GDPs tend to have higher suicide rates, while those with higher GDPs exhibit lower rates. Furthermore, the graph highlights certain countries that show improvement in their suicide rates over time.

4- What are the trends in suicide rates over the years, and how do they vary by country, sex, and age group?¶

In [49]:
# Create subplots with 3 rows and 1 column
fig = make_subplots(rows=3, cols=1,
                    subplot_titles=("By Age Group", "By Country (Top 10)", "By Sex"),  vertical_spacing=0.15)
#Age graph
data_age = df.groupby(["year", "age"])["suicides_rate"].mean().reset_index()

fig_age = px.line(data_age, x="year", y="suicides_rate", color="age", title="By Age Group")
for trace in fig_age["data"]:
    fig.add_trace(trace, row=1, col=1)

#Country graph
data_country = df.groupby(["year", "country"])["suicides_rate"].mean().reset_index()
top_10_countries = data_country.groupby("country")["suicides_rate"].mean().nlargest(10).index
data_country_top10 = data_country[data_country["country"].isin(top_10_countries)]

fig_country = px.line(data_country_top10, x="year", y="suicides_rate", color="country",
                      title="By Country (Top 10)")
for trace in fig_country["data"]:
    fig.add_trace(trace, row=2, col=1)

#Sex graph
data_sex = df.groupby(["year", "sex"])["suicides_rate"].mean().reset_index()

fig_sex = px.line(data_sex, x="year", y="suicides_rate", color="sex", title="By Sex")
for trace in fig_sex["data"]:
    fig.add_trace(trace, row=3, col=1)
    
fig.update_layout(height=800, showlegend=True)

fig.update_xaxes(title_text="Year", row=1, col=1)
fig.update_xaxes(title_text="Year", row=2, col=1)
fig.update_xaxes(title_text="Year", row=3, col=1)

fig.update_yaxes(title_text="Suicide Rate", row=1, col=1)
fig.update_yaxes(title_text="Suicide Rate", row=2, col=1)
fig.update_yaxes(title_text="Suicide Rate", row=3, col=1)

fig.show()

Age Graph: The graph shows that suicide rates increase with age, indicating that older individuals face more challenges and desperation. However, the suicide rates across different age groups remain relatively stable over the years.

Country Graph: The graph highlights the top 10 countries with the highest suicide rates. There is an upward trend from 1990, potentially influenced by factors like war, but a subsequent decline after 2005 suggests efforts to address these issues.

Sex Graph: The graph reveals a significant disparity in suicide rates between genders, with men exhibiting higher rates. Factors such as depression and societal pressures contribute to this discrepancy, emphasizing the need for targeted mental health support for men.

5- How does the population size of top 20 countries relate to its suicide rates? Are densely populated countries more prone to higher suicide rates?¶

In [50]:
data_population_suicide_rate = df.groupby(["country", "year"]).agg(
    population=("population", "mean"),
    suicides_rate=("suicides_rate", "mean")
).reset_index()

top_20_countries = data_population_suicide_rate.groupby("country")["population"].mean().nlargest(20).index
data_population_suicide_rate_top20 = data_population_suicide_rate[data_population_suicide_rate["country"].isin(top_20_countries)]

fig = px.scatter(data_population_suicide_rate_top20, x="population", y="suicides_rate", color='country',
                 labels={"population": "Population Size", "suicides_rate": "Suicide Rate per 100k Population"},
                 title="Population Size vs. Suicide Rate (Top 20 Countries)")

fig.show()

The visualization of the top 20 countries with the highest suicide rates demonstrates that there is minimal correlation between population size and suicide rates. This indicates that population alone does not play a significant role in the increase of suicide rates. Other factors, such as socioeconomic conditions, mental health awareness, and cultural influences, may have a more substantial impact on suicide rates.

6- Is there a significant difference in suicide rates across different generations and age groups ?¶

In [51]:
suicide_age_data = df.groupby(['age', 'generation']).agg({
    'suicides_rate':'mean'
    }).reset_index().sort_values(by='suicides_rate')
fig = px.bar(suicide_age_data, x='age', y='suicides_rate', color='generation',
            barmode='group', title='Suicide Rates by Age Category and Gender')
fig.show()

The visualization shows that the G.I. Generation and Silent Generation have higher suicide rates compared to other generations. This could be because they lived through difficult times like World War I, the Great Depression, and World War II, which may have led to more feelings of despair. On the other hand, Generation Z, Generation X, and Millennials have lower suicide rates, possibly because they grew up in a time of technological advancements and positive societal changes, which could have provided more support for their mental well-being.

7- How does the GDP per capita vary across the top 10 countries with highest suicide rate over the years?¶

In [52]:
top_10_countries = df.groupby('country')['suicides_rate'].mean().nlargest(10).index

filtered_data = df[df['country'].isin(top_10_countries)]

data_gdp_per_capita = filtered_data.groupby(['country', 'year'])['gdp_per_capita_($)'].mean().reset_index()

fig = px.line(data_gdp_per_capita, x='year', y='gdp_per_capita_($)', color='country',
              title='GDP per Capita Variation Across Top 10 Countries with Highest Suicide Rates')

fig.show()

The visualization shows how the GDP per capita changed from 1985 to 2015. Between 1985 and 2000, the GDP per capita stayed fairly consistent, meaning there weren't major changes in the average income per person. However, starting from 2000, there was a noticeable increase in the GDP per capita, suggesting that the economies of these countries started to grow and improve.

8- How does the ratio of suicides vary with GDP per capita across different generations?¶

In [53]:
# Group the data by generation, year, and calculate average values
grouped_data = df.groupby(['generation', 'year']).agg({
    'suicides_rate': 'mean',
    'gdp_per_capita_($)': 'mean'
}).reset_index()

# Create a scatter plot with animation for each year
fig = px.scatter(grouped_data, x='gdp_per_capita_($)', y='suicides_rate', color='generation',
                 labels={'gdp_per_capita_($)': 'GDP per Capita ($)', 'suicides_rate': 'Suicide Rate'},
                 title='Ratio of Suicides vs. GDP per Capita by Generation (Over Time)')

fig.show()

The visualization shows that the G.I. Generation had less GDP per Capita, which might explain why they had a higher suicide rate. On the other hand, the following generations had more GDP per Capita, and they tended to have lower suicide rates. Generation Z had the most GDP per Capita and the lowest suicide rate, suggesting that having more money may be related to better mental well-being across different generations.

Suicide Rate by Country Map¶

In [54]:
data = df.groupby('country')['suicides_rate'].mean().reset_index()

fig = go.Figure(data=go.Choropleth(
    locations=data['country'],
    locationmode='country names',
    z=data['suicides_rate'],
    colorscale='reds',
    colorbar_title='Suicide Rate',
))

fig.update_layout(
    title='Suicide Rate by Country',
    geo=dict(showframe=False, showcoastlines=False, projection_type='equirectangular'),
)

fig.show()

The map shows that the Russian Federation has the highest suicide rate, followed by neighboring European countries, which could be attributed to historical events such as wars or periods of economic hardship. In contrast, countries like Australia and America exhibit relatively lower suicide rates. These variations may be influenced by a combination of historical, cultural, and socioeconomic factors that impact mental health and well-being.

Correlation Features Summary¶

In [55]:
cols = ['suicides_no', 'population', 'suicides/100k_pop',
        'gdp_for_year_($)', 'gdp_per_capita_($)', 'suicides_rate',
        'gdp_total', 'suicides_per_gdp']

correlation_matrix = df[cols].corr()

fig = px.imshow(correlation_matrix, text_auto=True)

fig.update_layout(title="Correlation Matrix",
                  xaxis=dict(title="Columns"),
                  yaxis=dict(title="Columns"))
fig.show()

6. Conclusion¶

The visualizations and analysis provide valuable insights into suicide rates, GDP per Capita, and their relationships. The findings reveal that older individuals have higher suicide rates, emphasizing the need for targeted support for this age group. Males exhibit higher suicide rates across all age groups, highlighting the importance of addressing mental health challenges specific to gender. The negative correlation between GDP per Capita and suicide rates underscores the role of economic development in promoting mental well-being. Additionally, the variations in suicide rates among countries indicate the influence of historical events and socioeconomic factors. The findings suggest that improving living standards and mental health support contribute to suicide prevention.